1 Introduction

The ptairMS package provides a workflow to process PTR-TOF-MS raw data in the open Hierarchical Data Format 5 (HDF5; .h5 extension), and generate the peak table as an ExpressionSet object for subsequent data analysis with the many methods and packages available in R. Applications include the analysis of exhaled air, headspace or ambient air. The package offers several features to check the raw data and tune the few processing parameters. It also enables to include new samples in a study without re-processing all the previous data, providing a convenient management for cohort studies.

Proton Transfer Reaction - Mass Spectrometry (PTR-MS) has emerged with excellent sensitivity and specificity for VOC analysis in a wide range of applications, including environment, food quality, and biology (Blake, Monks, and Ellis 2009). In the area of health and care, PTR-MS opens up unique opportunities for real-time analysis “at the patient’s bedside”.

The three described approaches for the preprocessing of PTR-TOF-MS data to date, i.e. PTR-MS Viewer (Ionicon; commercial), PTR-TOF Data Analyzer (Müller et al. 2013), and PTRwid (Holzinger 2015), do not address the untargeted analysis of exhaled breath because i) they are either limited to targeted analysis or require very long computation time, ii) the have been developed for continuous environmental monitoring and do not take into account the time dimension, and iii) the algorithms cannot be accessed, nor modified, and such modules cannot be integrated into comprehensive bioinformatics workflows.

2 Volatolomics

Characterization of volatile organic compounds (VOCs) emitted by living organisms is of major interest in medicine, food sciences, and ecology. As an example, thousands of VOCs have been identified in the exhaled breath, resulting from normal metabolism or pathological processes (Lacy Costello et al. 2014). The main advantage of breath analysis in medicine is that the sampling is non-invasive (Devillier et al. 2017). Methods based on mass spectrometry (MS) are the reference technologies for VOC analysis because of their sensitivity and large dynamic range.

3 Workflow

First, a ptrSet object is generated with the function createPtrSet from the directory containing the raw files, possibly grouped into subfolders according to classes of samples. The object will contain all information about the study: the sample metadata, one peak list for each samples, and several quality metrics about the files. Second, the peak lists are aligned between samples and an ExpressionSet (Biobase package object) object is returned , containing the peak table, sample metadata, and feature metadata. Missing values in the dataMatrix can be replaced by the integrated signal in the expected raw data region. The features can be annotated with the annotateVOC function, based on the database described in (Lacy Costello et al. 2014).

4 ptairData package

Two real PTR-TOF-MS raw data sets are available on the ptairData package.

  • exhaled air of two healthy individal: three acquisitions per individual on diffrents day with two exprirations

  • mycobateria culture air analysis: two replicates of two species and control (culture medium)

For memory reasons, the raw data are truncate in mass axis at the range: \([20.4,21.6] U [50 , 150]\) for individual, and $ [20.4 ; 21.6] U [56.4 ; 90.6]$ for mycobacteria.

5 Hands on

library(ptairMS)
library(ptairData)

5.1 Checking raw data andsetting parameters

Start the analysis from a raw data directory. For this example, we use one of the ptairData package:

dirRaw <- system.file("extdata/exhaledAir", package = "ptairData")

Before processing data, there is two important steps:

  • calibrate the mass axis

  • determine the interest acquisition time periods (expirations or head space analysis for our examples).

To perform and check the quality of this steps, we first use the createPtrSet function, that will indivadually for each file read the raw data, calibrate the mass axis with mzCalibRef parameter, determine expirations or head space analysis limits that depend on fracMaxTIC parameter (see Determine expirations or head spaces analysis limits section), and provide a default sampleMetadata based on the tree structure of the directory.

This function creates a ptrSet object that contains all necessary information about the files in the study directory: calibration prameters, time limits, peak lists (as this step it is empty), and primary ion quanitfication. It can be update if new files are added are deleted from the directory with the function updatePtrSet (see update section).

To create a ptrSet object form the raw data directory:

exhaledSet <- createPtrSet(dirRaw,
                     setName="exhaledSet",
                     mzCalibRef = c(21.022, 60.0525),
                     fracMaxTIC = 0.7,
                     saveDir = NULL )
## ind1-1.h5  check
## ind1-2.h5  check
## ind1-3.h5  check
## ind2-1.h5  check
## ind2-2.h5  check
## ind2-3.h5  check
exhaledSet
## ptrSet object : exhaledSet 
## directory: C:/Users/CR258086/Documents/R/R-3.6.1/library/ptairData/extdata/exhaledAir 
##     6 files contains in the directory 
##     6 files check 
##     0 files processed by detectPeak

To get the list of file names in your directory (at the last time the ptrSet was created or updated), use :

getFileNames(exhaledSet)
## [1] "ind1-1.h5" "ind1-2.h5" "ind1-3.h5" "ind2-1.h5" "ind2-2.h5" "ind2-3.h5"

We will now quickly explain each steps of this function, and show how to choose and check the quality of mzCalibRef and fracMaxTIC parameters.

5.1.1 Calibration of mass axis

To convert Time Of Flight (TOF) axis to mass/charge ratio (mz) axis, the following formula is used (Müller et al. (2013)):

  • \(mz = \Big (\frac{TOF-b}{a} \Big)^2\)

To estimate parameters (a , b), reference peaks with known masses (and without overlapping peaks) are needed. For optimal results the mass axis calibration peaks have to be well distributed over the entire m/z. Set those masses in the mzCalibRef parameter. For exhaled air, we propose for addition masses of ionicon:

calib_table<-read.csv(system.file("extdata", "reference_tables/calib_table.tsv", package = "ptairMS"),sep="\t")
knitr::kable(calib_table)
Formula mz compound
H3 (18O)+ 21.02204 isotope of primary ion (H30+)
N2H+ 29.01342 Diazote
C3H5+ 41.03858
C3H7O+ 60.05250 isotope of Acetone
NA 203.94300 Ionicon Calibration mass 1
NA 330.84950 ionicon Calibration mass 2
Other refere nce masses ( at least two) can be provided by the user.

To check the calibration accurancy, you can use the plot method of ptrSet, that provides trhee plot calibError the calibration error boxplot in ppm, an estimation of the resolution \(\frac{m}{\Delta_m}\) for the mzCalibRef peaks and average normalized peak shape of the mzCalibRef.

plot(exhaledSet)

If a masses contains one or several files outlier (error > 20 ppm), you can check those files with the function plotCalib, that plot for selected files the total ion spectrum around all the mzCalibRef (with the exact masses as red vertical lines).

plotCalib(exhaledSet,fileNames=getFileNames(exhaledSet)[5])

You can then choose to keep or delete this masses from mzCalibRef. To change it, use calibration function on the ptrSte object.

Since we have truncate the mass axis, we can’t use masses all the masses listed above. To calibrate with at least three masses, we add the peak 75.04406 ([C3H6O2+H]+,Hydroxyacetone).

exhaledSet <- calibration(exhaledSet, mzCalibRef =  c(21.022, 60.0525,75.04406))

It will return the same object, with the calibration parameters updated.

plot(exhaledSet,type="calibError")

5.1.2 Determine expirations or head spaces analysis limits

The expirations or head space analysis are determined on the Total Ion Chromatogram. There correspond to the part of the TIC where intensity is higher than fracMaxTIC * max(TIC) with baseline removal. To analyze the entire spectrum (ambiant air), set fracMaxTIC to 0.

Example of expirations detetcion at fracMaxTIC = 0.5

Example of expirations detetcion at fracMaxTIC = 0.9

To see the expiration limits on all files in the ptrSet use plotTIC function:

  • plotTIC: This function plot the Total Ion Spectrum for several files. By default, when fileNames is NULL, all TICs files are plotted. A pdf file is generated with the pdfFile argument by give an absolute file path with the pdf extension. Finally, you can remove the baseline by set baselineRm = TRUE, and add the time limits to the plot by set showLimits = TRUE. A coloration by a column of the sampleMetadat is also posible by set the column name in colorBy parameter.
plotTIC(object = exhaledSet, showLimits = TRUE, pdfFile="AllTIC.pdf")
plotTIC(object = exhaledSet, showLimits = TRUE,fileNames=getFileNames(exhaledSet)[1],type = "ggplot")

To change the fracMaxTIC paramter, use timeLimits function :

exhaledSet<-timeLimits(exhaledSet,fracMaxTIC = 0.8)

An interface to manually change the limit of expiration is in progress.

5.1.3 Managing sample metadata

The createPtrSet function automatically generrates a default sampleMetadata ‘data.frame’. It contains file names in row names, and a column named ‘subfolder’ when the files are organized into subfolders in the parent directory.=. To get this data.frame, use the getSampleMetadata method. Row names of the sampleMetadata must always contains all file names from the directory.

getSampleMetadata(exhaledSet)
##           subfolder
## ind1-1.h5      ind1
## ind1-2.h5      ind1
## ind1-3.h5      ind1
## ind2-1.h5      ind2
## ind2-2.h5      ind2
## ind2-3.h5      ind2

You can at any moment obtain this default sample metada data with the function resetSampleMetadata(exhaledSet)

To modify the sample metadata, there exist two diffrent ways:

  • setSampleMetadatamethod: modify the data.frame and put it into the ptrSet object.
sampleMD <- getSampleMetadata(exhaledSet)
colnames(sampleMD)[1] <- "individual"  

exhaledSet <- setSampleMetadata(exhaledSet,sampleMD)
getSampleMetadata(exhaledSet)
##           individual
## ind1-1.h5       ind1
## ind1-2.h5       ind1
## ind1-3.h5       ind1
## ind2-1.h5       ind2
## ind2-2.h5       ind2
## ind2-3.h5       ind2
  • exportSampleMetada and importSampleMetadata methods:exportSampleMetada save the data.frame in .tsv format in the directory of your choice. Parameter saveFile must be in .tsv extension. You can open and modify it on any spreadsheet editor, but never change the row names: if is missing one file in row names, there will an error. Once the .tsv file has been modified, it can be imported back into the ptrSet object, with the function importSampleMetadata. It return the same ptrSet object, but with with the updated sampleMetadata.
exportSampleMetada(exhaledSet, saveFile = paste(DirBacteria,"sampleMetadata.tsv",sep="/"))
exhaledSet <- importSampleMetadata(exhaledSet, file = paste(DirBacteria,"sampleMetadata.tsv",sep="/"))

5.1.4 Saving

Finally, the function saves the object as .Rdata in saveDir directory if it is not NULL, with setName as name. Then, to import the saved ptrSet object, use load base function.

5.1.5 Plot raw data

There exist two functions for plotting the raw data:

  • plotRaw: Displays for a file the image of the matrix of intensities, the EIC, the sum spectrum, for the selected m/z and time ranges and annotation of known VOC possibly contains in the mz range (just for infomation, no peak detection have been done as this step).
plotRaw(exhaledSet, mzRange = 59 , fileNames = getFileNames(exhaledSet)[1],showVocDB = T)

##    mz_Hplus formula_Hplus                       cas.name
## 33 59.08552    [C4H10+H]+       2-methylpropane|n-butane
## 32 59.04914    [C3H6O+H]+ acetone|2-propen-1-ol|propanal
  • plotFeatures: for all selected files, the average spectrum for all time periods is plotted, with the background superimposed as a dashed line. As plotTIC function, you can choose type plotly or ggplot, and coloring spectrum by a column name of the sample Metadata
plotFeatures(exhaledSet,mz=21.022)
plotFeatures(exhaledSet,mz=59.049,type="ggplot",colorBy = "individual")

5.2 update the ptrSet

If you want to delete or added files to the directory after created the ptrSet, modify in the directory and run updatePtrSet function.

exhaledSet <- updatePtrSet(exhaledSet)
## nothing to update

5.3 Peak detection

Now we checked if the calibration is efficient, and if the expiration or head space are well determining. The, we can process all files with the detectPeak function.

For each file in the data set, this function :

  • calibrates each expiration and ambiant air

  • detects peak in each expiration and ambiant air (see annex 1 for peak detection algorithm)

  • aligns features between the background and diffrents analyticaltime periods (i.e. headspace or expirations)

  • keeps features present in at leat fracGroup % of expirations (if exhaled air)

  • if normalize=TRUE, normalizes with the primary ion (devided all features by the quantity of H3(O18)+ *488), and converts in ppb if the ptr reaction information are available in the h5 file (voltage, temperature and pressure of the drift)

The peakList is then written in the ptrSet object: a list of data.table, which contains for each file all peaks detected, with quantification of analytical time periods and background (ambient air).

exhaledSet <- detectPeak(exhaledSet ,  mzNominal = NULL,fracGroup=1,normalize = T)
## ind1-1.h5 : 133 peaks detected 
## ind1-2.h5 : 139 peaks detected 
## ind1-3.h5 : 126 peaks detected 
## ind2-1.h5 : 137 peaks detected 
## ind2-2.h5 : 145 peaks detected 
## ind2-3.h5 : 136 peaks detected

If you want to restrinct the detection to few nominal mass, set them in mzNominal, for example exhaledSet <- detectPeak(exhaledSet , mzNominal = c(21,59)).

This step can take few times if there are many files and a large mz range. You can then parallelize the algorithm by set parallel = TRUE and give the number of cores in nbCores (the number of files that will be processed simultaneous). To see the number of CPU cores available in your computer, use parallel::detectCores().

To see the resulting peak lists, use getPeakList method. It return a list with two elements:

  • raw: contains the peak List obtained of each expiration and background contatened, with the following column :

    • mz: the mass of the peak center
    • quanti: the concentration in count per extraction
    • delta_mz: the FWHM (Full Weigth Half Maximum) of the peak in Da
    • resultion: mz/delta_mz
    • group: the label of the analitycal time period: 0 for background, and 1, 2 … for the expirations/headSpace number
  • aligned: contains the peak list obtained after alignment beteween background and time periods:

    • mz: the mass of the peak center
    • quanti: the mean of the quantification over all time periods
    • delta_mz: the mean FWHM (Full Weigth Half Maximum) over all time periods
    • fracExp: percentage of times periods that contains this peak
    • background: the quantification in the ambiant air

if fracMaxTIC is set to zero at the begining, ranw and aligned peakList are the same

peakList<-getPeakList(exhaledSet)
head(peakList$aligned$`individual4-1.h5`)
## NULL

5.4 How to update ptrSet peakList

The ptairMs package allow to update the ptrSet object linked to a directory, as the data is acquired. When you acquire new data in the directory, first apply updatePtrSet function to the object, reset the sample metadata or complet it, and then apply detectPeak function.

exhaledSet<-updatePtrSet(exhaledSet)
exhaledSet<-setSampleMetadata(exhaledSet,resetSampleMetadata(exhaledSet))
exhaledSet<-detectPeak(exhaledSet)

5.5 Aligning samples

The alignment between samples (i.e. matching of variables between the peak lists of the samples) is performed by using a kernel gaussian density (Delabrière et al. 2017). At this stage, the table of peak intensities has been built, and will be handled together with the sample meta data (from ptrSet) and variable meta data containing the background peak table, as an ExpressionSet object.

It is possible to apply two filters:

  • On the background: only variables with intensities greater than bgThreshold * background are kept

  • On samples reproducibility: only variables which are detected in more than \(fracGroup\) % of samples are kept.

If you do not want to apply those filters, set fracGroup and bgThreshold to 0.

eSet <- alignSamples(exhaledSet, group="individual", fracGroup = 1, bgThreshold = 4 )
## 31 peaks aligned

To see those three tables contains in the ExpressionSet:

knitr::kable(head(Biobase::exprs(eSet)))
ind1-1.h5 ind1-2.h5 ind1-3.h5 ind2-1.h5 ind2-2.h5 ind2-3.h5
51.0442 2.9207285 2.8951917 2.4713660 8.839019 6.8739324 11.2131810
55.0389 45.4250743 29.9315739 34.9467164 53.857206 53.8403746 40.9425155
57.033 2.5827417 2.6633915 2.0290540 2.650103 6.6955009 2.5530096
58.0401 0.5620376 0.7557753 0.5553208 0.559625 0.5495952 0.6863493
59.0498 977.3908299 1461.1439782 1105.9457217 965.571330 746.9824237 1365.1584509
60.0526 28.7531310 41.2103019 30.9498646 30.028931 22.5069782 40.2432164
knitr::kable(head(Biobase::fData(eSet)))
Background - ind1-1.h5 Background - ind1-2.h5 Background - ind1-3.h5 Background - ind2-1.h5 Background - ind2-2.h5 Background - ind2-3.h5
51.0442 0.0553349 0.0869217 NaN NaN 0.0525769 0.0522817
55.0389 0.7546768 0.9956533 0.6909736 0.6928665 0.5618447 0.6924735
57.033 1.1260609 1.4269247 0.8211809 0.8360225 1.1612819 1.1188522
58.0401 0.0751487 0.1386051 0.0777504 0.0782107 0.1064082 0.0915683
59.0498 24.6982617 51.6309914 17.9742560 19.1219327 36.5289178 29.5324392
60.0526 0.9117387 1.7352323 0.7091926 0.7350493 1.2895062 1.1039797
knitr::kable(head(Biobase::pData(eSet)))
individual
ind1-1.h5 ind1
ind1-2.h5 ind1
ind1-3.h5 ind1
ind2-1.h5 ind2
ind2-2.h5 ind2
ind2-3.h5 ind2

5.6 Imputation

To imputue missing values, ptairMS returns back to the raw data, and compute with the same method as detectPeak the missing values without limit of detection.

eSet <- ptairMS::impute(eSet,  exhaledSet)

5.7 Annotation

ptairMS propose a putative annotation based on the database described in (Lacy Costello et al. 2014),with the annotateVOC function. Applied to an expression set, it write in the features metadata all possible formulas and names found.

annotateVOC(59.049)
##        vocDB_mz_Hplus vocDB_formula_Hplus                   vocDB_cas.name
## 59.049       59.04914          [C3H6O+H]+ 2-propen-1-ol, acetone, propanal
eSet<-annotateVOC(eSet)
knitr::kable(head(Biobase::fData(eSet)))
Background - ind1-1.h5 Background - ind1-2.h5 Background - ind1-3.h5 Background - ind2-1.h5 Background - ind2-2.h5 Background - ind2-3.h5 vocDB_mz_Hplus vocDB_formula_Hplus vocDB_cas.name
51.0442 0.0553349 0.0869217 NaN NaN 0.0525769 0.0522817
55.0389 0.7546768 0.9956533 0.6909736 0.6928665 0.5618447 0.6924735
57.033 1.1260609 1.4269247 0.8211809 0.8360225 1.1612819 1.1188522 57.03349 [C3H4O+H]+ 2-propenal
58.0401 0.0751487 0.1386051 0.0777504 0.0782107 0.1064082 0.0915683
59.0498 24.6982617 51.6309914 17.9742560 19.1219327 36.5289178 29.5324392 59.04914 [C3H6O+H]+ 2-propen-1-ol, acetone, propanal
60.0526 0.9117387 1.7352323 0.7091926 0.7350493 1.2895062 1.1039797

Then you can export the expressionSet with the function writeEset into 3 tabulated files ‘dataMatrix.tsv’, sampleMetadata.tsv’, and ‘variableMetadata.tsv’:

writeEset(eset, dirC = file.path(getwd(), "processed_dataset"))

5.8 Statistical analysis

You can now use known package as ropls to performs PCA or further statistical analysis.

X<-Biobase::exprs(eSet)
SMD<-Biobase::pData(eSet)
features<-Biobase::fData(eSet)
ropls::view(log2(X),printL = FALSE)

pca<-ropls::opls(t(log2(X)),crossvalI=5,info.txtC="none",fig.pdfC="none")
plot(pca,type="overview",parAsColFcVn=SMD$individual)

plot(pca,type="x-score",parAsColFcVn=SMD$individual)

plot(pca,type="x-loading")

dim1<-ropls::getLoadingMN(pca)[,1]
barplot(sort(abs(dim1),decreasing = T))

knitr::kable(features[names(sort(abs(dim1),decreasing = T)),c("vocDB_formula_Hplus","vocDB_cas.name")])
vocDB_formula_Hplus vocDB_cas.name
67.0538 [C5H6+H]+ 1,3-cyclopentadiene
70.0732
69.0699 [C5H8+H]+, [H8C5+H]+ (E)-, 1,3-pentadiene, 3-methyl-1-butyne, cyclopentene, isoprene, pentyne ui
68.0598
109.0663 [C7H8O+H]+ 1-hydroxy-3-methylbenzene, 4-methylphenol , benzyl alcohol, p-cresol
138.1339
51.0442
137.131 [C10H16+H]+ 1-methyl-4-(1-methylethyl)-1,3-cyclohexadiene, 1-methyl-4-(1-methylethyl)-1,4-cyclohexadiene, 7-methyl-3-methylene-1,6-octadieneFBrSa, 1-methyl-4-(1-methylethylidene)-cyclohexene, 2-methyl-5-(1-methylethyl)-bicyclo[3.1.0]hex-2-ene, 3-thujene , 3,7-dimethyl-1,3,6-octatriene, 3,7,7-trimethyl-bicyclo[4.1.0]hept-3-ene, 4-methylene-1-(1-methylethyl)-bicyclo[3.1.0]hexane, alpha-pinene, alpha-terpinene, beta-myrcene, beta-phellandrene, beta-pinene, beta-terpinene, camphene, cis-beta-ocimene, d-limonene, delta terpinene, dl-limonene, gamma-terpinene, terpinolene , trans-beta-ocimene, tricyclene
81.0694 [C6H8+H]+ 1,3-cyclohexadiene, 1,4-cyclohexadiene
55.0389
82.0721
95.0851
96.0872
57.033 [C3H4O+H]+ 2-propenal
87.0801 [C5H10O+H]+, [H10C5O+H]+ 2-methylbutanal, 2-pentanone, 3-methyl-2-buten-1-ol, 3-methyl-3-buten-1-ol, 3-methylbutanal, 3-pentanone, E-2-penten-1-ol, methylbutanal, pentanal
76.0476
71.0487 [C4H6O+H]+ 2-butenal , 2,3-dihydrofuran, 3-buten-2-one , crotonaldehyde, methyl vinyl ketone
59.0498 [C3H6O+H]+ 2-propen-1-ol, acetone, propanal
96.0532
95.0492 [C6H6O+H]+ phenol
60.0526
65.0593
75.0438 [C3H6O2+H]+ 1-hydroxy-2-propanone, 1,3-dioxolan, acetic acid, ethyl ester , ethyl formate, methanoic acid, methyl ester, propanoic acid , propionic acid
58.0401
89.0589 [C4H8O2+H]+ 1,4-diethylene dioxide, 2-methylpropanoic acid, acetic acid, butanoic acid , butyric acid, ethyl acetate, ethyl ester, methyl ester, propanoic acid
66.0645
97.0278 [C5H4O2+H]+, [H4C5O2+H]+ 5-hydroxy-2,4-pentadienoicacid, 2-pyranone, 3-furaldehyde, delta-lactone, furaldehyde, furaldehyde ui, furfural
63.0264 [C2H6S+H]+ dimethylsulfide , thiopropane
77.0597 [C3H8O2+H]+ 1-hydroxy-2-methoxyethane, 1,2-propanediol, dimethoxymethane
63.0073 [CH2O3+H]+ carbonic acid
78.0626

5.9 Processing only one raw file

To read and access to the entire raw data of one file, use readRaw function. It opens the data from a absolute file path whith rhdf5 library, and returns a ptrRaw object containing the raw data matrix, the time axis, the m/z axis obtained after calibration if calibTIS = TRUE, and information contained in the h5 file (transmission curve, drift temperature, …).

To get a absolute file path of one file:

samplePath <-list.files(dirRaw,recursive = T,full.names = T,pattern = ".h5$")[1]

To read the file :

sampleRaw <- readRaw(samplePath, calibTIS = F)
sampleRaw
## ind1-1.h5 
##   mz range:  20.4 - 150.7 
##   time range:  0 - 49 
##   Calibration error in ppm: 
##      No calibration performs

After that, you can use same method as ptrSet object: calibration, timeLimits, plotCalib, plotTIC, plotRaw, and detectPeak.

6 Acknowledgements

This research was developed within the Data Sciences for Deep Phenotyping and Precision Medicine team at CEA (C. Roquencourt, P. Zheng, P. Roger-Mele and E. Thévenot) in collaboration with the Exhalomics platform from the Foch Hospital (S. Grassin-Delyle, P. Devillier, H. Salvator, E. Naline and L.-J. Couderc) within the SoftwAiR project supported by the Agence Nationale de la Recherche (ANR-18-CE45-0017 grant).

7 Annex

7.1 Peak detection and quanitfication methods

For each nominal mass m/z:

  • Selection of the m/z range: [m/z - 0.6, m/z + 0.6] and definition of m/z windows for noise ([m/z - 0.6, m/z - 0.4] U [m/z + 0.4, m/z + 0.6]) and signal ([m/z - 0.4, m/z + 0.4])

  • Removing baseline (CLAYTON et al., n.d.)

  • Smoothing the signal with Savitzky Golay windows (Savitzky and Golay 1964) computed by using noise autocorrelation (Vivó-Truyols and Schoenmakers 2006)

  • Detection of peak maxima by using the derivatives of Savitzky Golay filter (all concave regions are determined with the second derivative, and the maxima correspond in each of those regions to the point with the first deriative closest to zero)

  • Thresholding to the noise maximum arround the mass

  • Iterrative fit function sech2: \(peak_i(h_i,m_i,\Delta_{1i},\Delta_{2i}) = {h_i}/{(cosh(\Delta_{1i}*(x-m_i))^2*(x<=m_i)+cosh(\Delta_{2i}*(x-m_i))^2*(x>m_i))}\), with initialisation and constains:

    • m : mass of the peak maxima detected \(+- [\Delta/4]\)
    • \(\Delta_1\)= \(\Delta_2\) = m/5000 (resolution mean)
    • h = intensity of the peak at the center
  • Quantification by summing the fitted peaks over a range of 10*\(\Delta_i\)

peakList<-PeakList(sampleRaw, mzNominal = c(63), 
                                plotFinal=TRUE ,plotAll =TRUE)

peakList$peak
##         mz   quanti   delta_mz resolution        R2 parameter.1 parameter.2
## 1 63.00499 2078.309 0.01242852   5069.388 0.9992448   165.30514    124.1945
## 2 63.02435 3323.928 0.01148845   5485.889 0.9992448   158.90175    148.3347
## 3 63.04147 1042.109 0.01692215   3725.382 0.9992448    83.88503    137.3878
##   parameter.3
## 1   264.75165
## 2   458.04107
## 3    97.51366

8 Session Info

Here is the output of sessionInfo() on the system on which this document was compiled:

## R version 3.6.1 (2019-07-05)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17763)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252   
## [3] LC_MONETARY=French_France.1252 LC_NUMERIC=C                  
## [5] LC_TIME=French_France.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ptairData_0.1.0   ptairMS_0.0.0     data.table_1.12.8 BiocStyle_2.14.2 
## 
## loaded via a namespace (and not attached):
##   [1] ProtGenerics_1.18.0   bit64_0.9-7           RColorBrewer_1.1-2   
##   [4] doParallel_1.0.15     httr_1.4.1            MSnbase_2.12.0       
##   [7] tools_3.6.1           backports_1.1.5       R6_2.4.1             
##  [10] affyio_1.56.0         rpart_4.1-15          Hmisc_4.3-0          
##  [13] lazyeval_0.2.2        BiocGenerics_0.32.0   colorspace_1.4-1     
##  [16] nnet_7.3-12           tidyselect_0.2.5      gridExtra_2.3        
##  [19] bit_1.1-14            compiler_3.6.1        preprocessCore_1.48.0
##  [22] Biobase_2.46.0        htmlTable_1.13.3      plotly_4.9.1         
##  [25] labeling_0.3          bookdown_0.16         checkmate_1.9.4      
##  [28] scales_1.1.0          affy_1.64.0           stringr_1.4.0        
##  [31] digest_0.6.23         foreign_0.8-73        rmarkdown_2.0        
##  [34] jpeg_0.1-8.1          base64enc_0.1-3       pkgconfig_2.0.3      
##  [37] htmltools_0.4.0       fastmap_1.0.1         limma_3.42.0         
##  [40] highr_0.8             htmlwidgets_1.5.1     rlang_0.4.2          
##  [43] rstudioapi_0.10       impute_1.60.0         shiny_1.4.0          
##  [46] farver_2.0.1          jsonlite_1.6          crosstalk_1.0.0      
##  [49] mzID_1.24.0           BiocParallel_1.20.0   ropls_1.18.0         
##  [52] acepack_1.4.1         dplyr_0.8.3           magrittr_1.5         
##  [55] Formula_1.2-3         Matrix_1.2-18         MALDIquant_1.19.3    
##  [58] Rcpp_1.0.3            munsell_0.5.0         S4Vectors_0.24.1     
##  [61] Rhdf5lib_1.8.0        lifecycle_0.1.0       vsn_3.54.0           
##  [64] stringi_1.4.3         yaml_2.2.0            MASS_7.3-51.4        
##  [67] zlibbioc_1.32.0       rhdf5_2.30.1          plyr_1.8.5           
##  [70] grid_3.6.1            parallel_3.6.1        promises_1.1.0       
##  [73] crayon_1.3.4          lattice_0.20-38       splines_3.6.1        
##  [76] mzR_2.20.0            zeallot_0.1.0         knitr_1.26           
##  [79] pillar_1.4.2          codetools_0.2-16      stats4_3.6.1         
##  [82] XML_3.98-1.20         glue_1.3.1            evaluate_0.14        
##  [85] latticeExtra_0.6-29   pcaMethods_1.78.0     BiocManager_1.30.10  
##  [88] png_0.1-7             vctrs_0.2.1           httpuv_1.5.2         
##  [91] foreach_1.4.7         gtable_0.3.0          purrr_0.3.3          
##  [94] tidyr_1.0.0           assertthat_0.2.1      ggplot2_3.2.1        
##  [97] xfun_0.11             mime_0.8              xtable_1.8-4         
## [100] later_1.0.0           ncdf4_1.17            survival_3.1-8       
## [103] viridisLite_0.3.0     signal_0.7-6          minpack.lm_1.2-1     
## [106] tibble_2.1.3          iterators_1.0.12      IRanges_2.20.1       
## [109] cluster_2.1.0         ellipsis_0.3.0

Bibliography

Blake, Robert S., Paul S. Monks, and Andrew M. Ellis. 2009. “Proton-Transfer Reaction Mass Spectrometry.” Chemical Reviews 109 (3): 861–96. doi:10.1021/cr800364q.

CLAYTON, C G RYAN, E Clayton, W L Griffin, and S H Sie. n.d. “SNIP, A STATISTICS-SENSITIVE BACKGROUND TREATMENT FOR THE QUANTITATIVE ANALYSIS OF PIXE SPECTFtA IN GEOSCIENCE APPLICATIONS,” 7.

Delabrière, Alexis, Ulli M Hohenester, Benoit Colsch, Christophe Junot, François Fenaille, and Etienne A Thévenot. 2017. “proFIA: A Data Preprocessing Workflow for Flow Injection Analysis Coupled to High-Resolution Mass Spectrometry.” Edited by Oliver Stegle. Bioinformatics 33 (23): 3767–75. doi:10.1093/bioinformatics/btx458.

Devillier, Philippe, Helene Salvator, Emmanuel Naline, Louis-Jean Couderc, and Stanislas Grassin-Delyle. 2017. “Metabolomics in the Diagnosis and Pharmacotherapy of Lung Diseases.” Current Pharmaceutical Design 23 (14). doi:10.2174/1381612823666170130155627.

Holzinger, R. 2015. “PTRwid: A New Widget Tool for Processing PTR-TOF-MS Data.” Atmospheric Measurement Techniques 8 (9): 3903–22. doi:10.5194/amt-8-3903-2015.

Lacy Costello, B de, A Amann, H Al-Kateb, C Flynn, W Filipiak, T Khalid, D Osborne, and N M Ratcliffe. 2014. “A Review of the Volatiles from the Healthy Human Body.” Journal of Breath Research 8 (1): 014001. doi:10.1088/1752-7155/8/1/014001.

Müller, Markus, Tomáš Mikoviny, Werner Jud, Barbara D’Anna, and Armin Wisthaler. 2013. “A New Software Tool for the Analysis of High Resolution PTR-TOF Mass Spectra.” Chemometrics and Intelligent Laboratory Systems 127 (August): 158–65. doi:10.1016/j.chemolab.2013.06.011.

Savitzky, Abraham., and M. J. E. Golay. 1964. “Smoothing and Differentiation of Data by Simplified Least Squares Procedures.” Analytical Chemistry 36 (8): 1627–39. doi:10.1021/ac60214a047.

Vivó-Truyols, Gabriel, and Peter J. Schoenmakers. 2006. “Automatic Selection of Optimal Savitzky−Golay Smoothing.” Analytical Chemistry 78 (13): 4598–4608. doi:10.1021/ac0600196.